Method and apparatus for compiling computer programs with interprocedural register allocation

ABSTRACT

Optimization techniques are implemented by means of a program analyzer used in connection with a program compiler to optimize usage of limited register resources in a computer processor. The first optimization technique, called interprocedural global variable promotion allows the global variables of a program to be accessed in common registers across a plurality of procedures. Moreover, a single common register can be used for different global variables in distinct regions of a program call graph. This is realized by identifying subgraphs, of the program call graph, called webs, where the variable is used. The second optimization technique, called spill code motion, involves the identification of regions of the call graph, called clusters, that facilitate the movement of spill instructions to procedures which are executed relatively less often. This decreases the overhead of register saves and restores which must be executed for procedure calls.

This is a continuation of application Ser. No. 07/435,914 filed on Nov.13, 1989, now U.S. Pat. No. 5,428,793.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material towhich a claim of copyright protection is made. The owner has noobjection to the facsimile reproduction of the patent document or patentdisclosure as it appears in the Patent and Trademark Office patent fileor records, but reserves all other rights whatsoever.

BACKGROUND OF THE INVENTION

This invention relates to computer program compilers and moreparticularly a computer program compilation system that supportsregister allocation across procedure and compilation unit boundarieswhere there are global variables and a limited number of registers forstorage and manipulation of data.

In a traditional compiler, register allocation is performed on eachprocedure one at a time. In some compilers, the register allocator hasaccess to register allocation information from other procedures withinthe same compilation unit. The compiler can use this information toimprove the register allocation in the callers of these routines. Thistype of technique is limited in scope to the procedures of a singlecompilation unit.

The traditional intraprocedural register allocation process iseffective, but in the absence of interprocedural information thefollowing situations occur:

• Local values in different procedures are assigned to the sameregister. As a result, procedures must execute code to save and restorethese registers in order to preserve the values needed by the callingprocedure.

• Global variables are referenced out of different registers indifferent procedures. This requires a modified value of a globalvariable to be stored to memory before any procedure call, and loadedback from memory before any subsequent use of that variable. This alsorequires each procedure which references that variable to load thevariable from memory if it is used before being redefined, and to storethe global variable to memory before the exit point if the variable ismodified within that procedure.

For most programming languages, improving this situation is complicatedby the need to support multiple compilation units. For example, if onewishes to keep a certain global variable in a register when compilingmodule A, one must ensure that any reference to that variable in adifferent module uses the same register.

One possible solution is to delay register assignment until link time,when the code for the entire application is visible. This solution isdifficult to implement with traditional compiler architectures, however,because of the need for dataflow and live range information at registerallocation time. Moreover, computing this information would create anunreasonable delay each time a user needed to re-link an application.

There are two known significant research efforts that have addressed theweaknesses of procedure-at-a-time register allocation. The first wascarried out at DEC's Western Research Lab in 1986 and described by DavidW. Wall in an article entitled "Global Register Allocation At Link Time"in the Proceedings of the SIGPLAN '86 Symposium On CompilerConstruction, SIGPLAN Notices, Vol. 21, No. 7, July 1986, pages 264-275.In this technique, the compiler does a simple register allocation oneach procedure and generates register relocation information for thelinker. The user may optionally enable interprocedural registerallocation at link time. To promote a variable to a register, the linkeronly needs to follow the prescribed relocation actions. This techniqueshowed some good results. Some benchmarks improved by as much as 8% on a64-register RISC machine, with a majority of the benefit attributed tothe promotion of global variables.

Global variable promotion is an optimization technique where memoryreferences to global variables are converted into register references.In effect, the global variable is promoted from being a memory object toa register object. Traditional compilers sometimes promote globalvariables to registers locally within a procedure. Such locally promotedglobal variables are still accessed out of memory across procedures.Before procedure calls and at the exit point, the compiler insertsinstructions to store the register containing the promoted globalvariable back to memory. Similarly, just after procedure returns and atthe entry point, the optimizer inserts instructions to load the promotedglobal variable from memory to register.

The second significant research effort was produced at MIPS ComputerSystems and described by Fred C. Chow in "Minimizing Register UsagePenalty at Procedure Calls" published in Proceedings of the SIGPLAN '88Conference on Programming Language Design and Implementation, July 1988,pages 85-94 and also in an article authored with others in "Cross-ModuleOptimizations: Its Implementation and Benefits" published in theProceedings of the Summer 1987 USENIX Conference, pages 347-356. In theMIPS system, the multiple compilation unit problem is solved by exposingan intermediate code representation to the user. Then, instead oflinking object code, the user must link the intermediate code files intoa single, large intermediate program file. The intermediate code linkerthen completes the code generation and optimization process. As part ofthis process, the optimizer tries to minimize register spill byperforming register allocation on procedures in a reverse hierarchicalorder and propagating register usage information upwards in the callgraph. This technique showed generally positive results, although therewere exceptions noted. In one example discussed, this process resultedin object code which executed more slowly than a version compiledwithout interprocedural register allocation. There are other computersystems and compilers that have been implemented which use a similartechnique within a single compilation unit.

On many contemporary computer architectures, machine registers aredivided by software conventions into three classes: status registers,caller-saves registers, and callee-saves registers.

• Status registers are registers which are designated to hold specificvalues which may not be used to hold variables or other temporaryvalues. Examples include a stack pointer and a global data pointer.

• Caller-saves registers are registers which may be used within aprocedure to hold values, but these values are not guaranteed to remainunchanged after executing a call to another procedure. These registersmay be used by a procedure without being preserved in memory before theyare used. The name "caller-saves" refers to the fact that the caller ofa procedure must save any needed values in these registers so the calledroutine may use those registers.

• Callee-saves registers are registers which may be used within aprocedure to hold values, and these values are guaranteed to remainunchanged after executing a call to another procedure. However, thevalues in these registers must be spilled before they are used and thenrestored to the register before exiting the procedure. The name"callee-saves" refers to the fact that the called routine is responsiblefor saving these registers before they are used.

In the absence of interprocedural information, callee-saves registerspilling is necessary in every procedure which needs to use a registerof that class. This creates significant overhead in many programs.

Some other references to related work include:

"LISP on a Reduced Instruction Set Processor: Characterization andOptimization", by P. A. Steenkiste of Stanford University ComputerSystems Laboratory, PhD Thesis, Chapter 5, March 1987. This approach issimilar to that of MIPS, except that it reverts to ordinaryintraprocedural register allocation when the interprocedural registersare exhausted in upper regions of the call graph.

"Data Buffering: Run-Time Versus Compile Time Support" by Hans Mulder,Proceedings of the 3rd International Conference on Architectural Supportfor Programming Languages and Operating Systems, Apr. 3-6, 1989, pages144-151. This approach is also similar to that of MIPS except that it islimited in scope to single compilation units.

"The Impact of Interprocedural Analysis and Optimization in the R^(n)Programming Environment" by Keith D. Cooper, Ken Kennedy, and LindaTorczon of Rice University. Published in the ACM Transactions onProgramming Languages and Systems, October 1986, pages 491-523. Thispaper describes a program compiler which computes interproceduraloptimization information, but does not address the register allocationproblem.

Hewlett-Packard's Apollo Division uses an interprocedural registerallocation scheme within a single compilation unit in their DN10000architecture compilers. As with the references above, except for the DECpaper, this approach does not attempt to keep global variables inregisters across procedures.

What is needed is a method and apparatus for optimizing register usagewhere there is a limited number of available register resources in acomputer processor and where a plurality of procedures and variables areinvolved.

SUMMARY OF THE INVENTION

According to the invention, two specific optimization techniques areimplemented by means of a program analyzer used in connection with aprogram compiler to optimize usage of limited register resources in acomputer processor. The first optimization technique, calledinterprocedural global variable promotion allows the global variables ofa program to be accessed in common registers across a plurality ofprocedures. Moreover, a single common register can be used for differentglobal variables in distinct regions of a program call graph. This isrealized by identifying subgraphs of the program call graph, calledwebs, where the variable is used. The second optimization technique,called spill code motion, involves the identification of regions of thecall graph, called clusters, that facilitate the movement of spillinstructions to procedures which are executed relatively less often.This decreases the overhead of register saves and restores which must beexecuted for procedure calls.

The program analyzer according to the invention reads summary filesproduced by a compiler modified to create summary files containing, foreach procedure of a source code file, global variable usage information,register need information, and names of called procedures. The compileris run on each source code file separately to produce separate summaryfiles. The program analyzer computes interprocedural register allocationinformation from the summary files and writes it out to a programdatabase file. The program analyzer builds a single program call graph(PCG) from all the summary files. The PCG consists of a set of nodes,each representing a procedure, interconnected by directional edges, eachrepresenting a call from a first procedure to a second procedure.

Traditional intraprocedural register allocators have in the pastemployed data structure known as def-use chains to represent live rangesof variables. Def-use chains are analogous to spider webs linkingequivalence classes of definitions and uses of a variable. Consequently,def-use chains are sometimes referred to as "webs". Webs have not beenemployed in the past in any interprocedural register allocationtechniques. In other words, live ranges for global variables have notbeen computed across procedure boundaries.

In order to facilitate global variable promotion, the program analyzeridentifies webs for selected global variables (the global variablesselected are those variables that are eligible for assignment to aninterprocedural machine register). A web for a single global variable isa collection of PCG nodes such that the global variable is accessed inat least one node of the web and such that, for each node in the web,the global variable is not accessed in any ancestor node not in the web,and the global variable is not accessed by any descendant node not inthe web. Multiple webs may be identified for a single global variable.

The program analyzer then prioritizes the webs according to frequency ofuse of the corresponding global variable within nodes of the web. Thewebs can optionally be prioritized based on profile informationcollected from an earlier run of the compiled source files.

The program analyzer then assigns the first available interproceduralmachine register to the selected webs in priority order. In assigninginterprocedural machine registers to the selected webs, the programanalyzer ensures that webs that have common PCG nodes are assigneddifferent machine registers. The interprocedural registers assigned tothe webs are chosen from a limited sub-set of machine registersdesignated for preserving values across procedure calls (callee-savesregisters).

Unlike previous approaches, this method of interprocedural registerassignment allows a single register to be used for different purposes indistinct regions of the PCG. Specifically, a single interproceduralregister can be used for the promotion of different global variables indifferent regions of the PCG. This allows a larger number of globalvariables to be promoted than the approach described by David W. Wall.Moreover, with our method, global variables are not be promoted in theregions of the PCG in which the variable is not used.

Another function of the program analyzer is to facilitate the reductionof the overhead associated with saving and restoring callee-savesregisters. This overhead is mitigated through spill code motion.

In order to facilitate spill code motion, the program analyzer firstidentifies clusters of nodes of the PCG. A cluster is a collection ofPCG nodes such that there exists a unique root node of said cluster onlythrough which every other node in the cluster can be called. Profileinformation collected form an earlier run of the compiled source filesmay be used to aid cluster identification.

Interprocedural machine registers are assigned to each cluster node,according to the register need for the corresponding procedure and asrestricted by the cluster organization.

The root node of each cluster is designated to execute machineinstructions to preserve the values of the interprocedural registersassigned to nodes of that cluster. The machine instructions are executedupon calls to the cluster root node so that other nodes within thecluster need not execute the machine instructions.

The assignment of interprocedural machine registers to global variablewebs and cluster nodes are finally written out by the program analyzerto the program database.

The summary files read by the program analyzer are produced inconjunction with intermediate files by a first phase of compileroperation. In a second phase, the program database file and intermediatefiles created by the first phase, are processed to produce individualobject code files with the requisite interprocedural registerassignments. In this stage, the pseudo-register operands used in theintermediate code (read from the intermediate files) are mapped intomachine registers by a register allocator. The register allocator usesthe interprocedural machine registers specified in the program databasefile to map certain pseudo-registers to machine registers.

For each procedure corresponding to a node of a web that was assigned aninterprocedural machine register by the program analyzer, all memoryreferences to the corresponding global variable are converted intointerprocedural register references. At the root nodes of webs that wereassigned an interprocedural machine register, instructions are added atthe entry point to load the value of the corresponding global variablefrom memory into the interprocedural register and at the exit point tostore the value back to memory. Additionally, machine instructions(spill code instructions) are added to preserve the value of theinterprocedural register across calls to that root node. Thesetransformations effectively promote the storage class of the selectedglobal variables to register i.e. they result in global variablepromotion.

For each procedure corresponding to a node of a cluster that wasassigned interprocedural machine registers by the program analyzer,certain pseudo-registers are mapped into those interprocedural machineregisters instead of the ordinary callee-saves registers. Machineinstructions are added at the root nodes of clusters designated topreserve the values of the interprocedural registers assigned to nodesof that cluster. These transformations effectively result in spill codemotion.

The invention will be better understood by reference to the followingdetailed description in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system incorporating theinvention.

FIG. 2 is a block diagram of a computer program compilation system inaccordance with the invention.

FIG. 3 is a block diagram of a register allocator used in accordancewith the invention.

FIG. 4 is an internal view of the compilation system operational inaccordance with the invention.

FIG. 5 illustrates a sample program call graph and the webs identifiedfor three global variables.

FIG. 6 is a table describing the usage of the global variables atdifferent nodes of the program call graph illustrated in FIG. 5.

FIG. 7 illustrates clusters of nodes within the same program call graphshown in FIG. 5.

FIG. 8 is another sample program call graph to illustrate a potentialcluster identification problem.

FIG. 9 is another sample program call graph to illustrate a potentialweb identification problem.

DESCRIPTION OF SPECIFIC EMBODIMENTS

The invention is described by a specific embodiment as might beimplemented on a Hewlett-Packard Precision Architecture computer system.The invention may be applied to any computer system where there is afinite number of registers and more particularly to computer systemswhere the number of registers available for computation and otherprocessing is limited. This invention applies particularly to computersystems that categorize the available registers by software conventioninto callee-saves and caller-saves register partitions. The inventionoperates on source files, which are human readable high-level programfragments which comprise procedures.

Referring to FIG. 1, there is shown a computer system 10 incorporatingthe present invention. A compiler 12, is resident in a computer 14,including a processor, and a program analyzer 16 is resident in thecomputer 14 in accordance with the invention. The invention operates asfollows:

A plurality of source code files 18, 20, 22 are supplied one at a timeas input to the computer 14. The computer 14, through the use of thecompiler 12 and the program analyzer 16, processes the source code files18, 20, 22 to produce object files incorporating interproceduralregister allocation optimization in accordance with the invention. Theobject files are linked into an executable file 24 by a linker 17. Theexecutable file 24 may be run on the computer 14 or on another machineof the type for which the compiler has generated the executable file 24.

Referring to FIG. 2, there is shown a block diagram of a portion of anapparatus 26 for performing interprocedural register allocation inaccordance with the invention. The apparatus 26 comprises a scanner 28,for sequentially reading source code files 18, 20, and 22, a parser 30,coupled to receive from the scanner 28, lexically checked tokens of thesource code files one at a time for verifying syntactic correctness. Inaddition, a translator 32, coupled to receive the output of the parser30 is operative to check for semantic correctness and to generate anintermediate code representation 34, 36, 38, for each of the source codefiles, respectively. The intermediate code is an ordered list ofprocedures translated from the source code. Each procedure consists of asequence of machine instructions 108, each of which consists of anopcode 100 and pseudo-registers 102, 104, 106 as shown in FIG. 3. Theintermediate code is temporarily stored for subsequent use orregenerated later. Each unit of the intermediate code is analyzed by avariable usage analyzer 40, register need analyzer 42, and a procedurecall analyzer 44.

The variable usage analyzer 40, identifies the global variables used ineach procedure, and the nature of the access to the global variables.

The procedure call analyzer 44 identifies the names of the calledprocedures and the nature of the calls to each procedure.

The register need analyzer 42 estimates machine register need byexamining the intermediate code.

The summary file generator 46 outputs the results produced by theanalyzers to summary files 48, 50, 52 in accordance with the invention.The summary file contains a record for each procedure in thecorresponding source files. The summary files 48, 50, 52 are processedserially by a program analyzer 54 which analyzes the summary files asherein below described. The results of the program analyzer analysis isstored in a program database 56.

Optionally, profile information from a profile information file 58 isalso provided to the program analyzer 54. The profile information iscollected from an earlier run of the result of a previous compilation 60of the same source files 18, 20, and 22.

Referring to FIG. 3 there is shown an illustration of a registerallocator 62 in accordance with the invention. The register allocator 62is operative to receive the information in the program database 56 andthe intermediate code 34, 36, 38 in the form of procedures. Eachprocedure consists of a sequence of instructions 108, each of whichconsists of an opcode 100 and pseudo-registers 102, 104, and 106. Theregister allocator 62 is responsible for converting memory references topromoted global variables into interprocedural register references. Theregister allocator 62 is also responsible for replacing all of thepseudo-registers 102, 104, 106 in the intermediate code with machineregisters, some of which may be interprocedural registers 110, 113. Theregister allocator is also responsible for adding spill code 116, 118 topreserve the values of certain registers, including the interproceduralregisters in root nodes of clusters and webs as herein after explained.

The result of register allocation for each set of procedures is outputto an object file 120, which is in turn processed by the linker 17(FIG. 1) into an executable file 24 (FIG. 1).

Referring to FIG. 4, there is shown an internal overview of thecompilation system 122 with source files 18, 20, 22, summary files 48,50, 52, intermediate files 34, 36, 38, program analyzer 54, profileinformation 58, program database 56, object files 120, the linker 17,run time libraries 19, an executable file 24. The compiler comprises afirst phase 12A and a second phase 12B. The compiler first phase shownin FIG. 4 comprises the scanner 28, the parser 30, and the translator 32shown in FIG. 2. The compiler second phase 12B shown in FIG. 4 includesthe register allocator 62 shown in FIG. 3.

The collection of summary files 48, 50, 52 produced by the compilerfirst phase 12A is exposed to the program analyzer 54. The programanalyzer is responsible for constructing a program call graph 130 (FIG.5) for the program, identifying webs and clusters for the program callgraph and making decisions about how interprocedural registers are to beallocated to the webs and clusters across that graph 130.

The program call graph 130 comprises nodes A, B, C, D, E, F, G, H, and Iwith directional edges 132, 134, 136, 138, 140, 142, 144, 146, and 148connecting the nodes. Webs 150, 152, 154, and 156 are identified for theprogram call graph 130 in FIG. 5 and clusters 158, 160, and 162 areidentified on the same program call graph 130 in FIG. 7 as hereinafterexplained.

An interprocedural register is allocated to each selected web. Aplurality of interprocedural registers are allocated to nodes of eachcluster. These allocations are recorded in the form of register actionsto a program database 56 of information about the program. The registerallocator 62 uses information from the program database 56 to guideregister allocation on each procedure in the intermediate file, one at atime.

The program analyzer 54 has been designed with the availability ofprofile information 58 in mind. In a particular embodiment, theimplemented algorithms are not dependent on profile information, butsome of the heuristics used in the program analyzer can be improved withprofile information 58.

The Program Analyzer

The program analyzer 54 generates the program database 56 by completingthe following tasks, in order.

[1] Build a program call graph 130 (FIG. 5).

[2] Identify webs for global variables and select webs for whichinterprocedural registers are to be allocated.

[3] Assign interprocedural registers to the webs selected in step 2.

[4] Identify clusters in the call graph.

[5] Pre-allocate registers to the nodes of each cluster identified instep 4 and identify the registers that must be spilled at the root ofeach cluster.

[5] Compute register actions for each node of the program call graph andwrite these out to the program database.

Global Variable Promotion Strategy

Interprocedural register promotion is an optimization that attempts tomaintain certain important user global variables in registers acrossprocedure boundaries.

The preferred embodiment makes use of the compilation system shown inFIG. 4 to automatically promote key global variables to registers acrossprocedure boundaries. The compiler first phase 12A is responsible forcommunicating the callee-caller relationships and global variable usagefor each of the source files 18, 20, 22 to the program analyzer 54through summary files 48, 50, 52.

The program analyzer 54 collects the global variable usage informationacross all source files that make up the program. It determines whichglobals are ineligible for register promotion and of the remainingglobal variables, selects those that are the most heavily referenced,for register promotion.

The program analyzer 54 partitions the references to a global variableinto disjoint webs and consider each web individually for registerallocation.

A web, for the purposes of interprocedural global register allocation,identifies a minimal set of nodes in the call graph over which aneligible global variable could reside in an interprocedural callee-savesregister (without requiring caller spill). Typically, the nodes of a webwould be a connected subgraph which use or define the web variable. Thesubgraph is selected such that no ancestral node or descendent node inthe call graph references the same web variable.

Each web can have one or more entry nodes. These entry nodes arebasically the root nodes of the web subgraph. Typically, an entry nodewould contain references to the global variable while none of its directpredecessors would. All descendant nodes of each entry node that eitherhave a local reference or a child reference to the global, are part ofthe web. If a web is selected for promotion, at every node of the web,references to the global memory variable would be replaced by a registerreference to a reserved interprocedural callee-saves register whichwould be unavailable for normal intraprocedural register coloring.

Additionally, at the entry nodes of the web, the global variable isloaded into the reserved callee-saves register on entry and stored backon exit (e.g. instructions 115 and 117 in FIG. 3). (Note that thisallows a global variable whose value has been initialized at compiletime to be safely promoted to a register.) For this approach to work,entry nodes are required to have ONLY external predecessors nodes andinternal nodes are not allowed to have ANY external predecessors.

To meet this requirement, special entry nodes may need to be added tothe web. For example, consider the program call graph 200 in FIG. 9.Suppose a global variable g3 is referenced in procedures S 202 and U204, but not in procedure T 206. Suppose further that the web for g3were to include only nodes S and U. If that web is assigned aninterprocedural register, the instructions for procedure U willreference the value of g3 from that interprocedural register. When U iscalled from T, however, no instructions will have been executed to placethe most recent value of g3 in that interprocedural register. This willbe an incorrect translation of the source code. To solve this problem,procedure T is added to the web and designated to be an entry node.

The webs for a global variable are identified by using dataflow sets(bit vectors of numeric global identifiers, L₋₋ REF, C₋₋ REF, and P₋₋REF) at each node of the call graph. These dataflow sets are computed bypropagating the local reference information at each node throughout thecall graph. After the webs are built, an interference graph isconstructed. In an interference graph, each node represents a web andeach edge represents an interference between two webs. Two websinterfere with each other if the respective global variables aresimultaneously `live` at one or more nodes in the call graph (i.e. ifthe two webs have common call graph nodes). A global variable is `live`at a node, if there are references to that global variable either atthat node or in an ancestral or descendent node in the program callgraph.

Note that by definition, two different webs of the same global variablewill never interfere with each other. The nodes of the interferencegraph will be assigned (colored with) one of the reserved callee-savesregisters in a manner which avoids assigning the same register toadjacent nodes in the interference graph. Before coloring, the webs aresorted into an order based on heuristics. If a node is coloredsuccessfully, then the corresponding web is in effect promoted to aregister. Register actions will then be assigned to the nodes comprisinga colored web. These register actions are used to direct the registerallocator on how to promote the global variable to a register. Theregister actions for a node could include one or more of the followingdirectives.

1. Make callee-saves register R unavailable for intraprocedural use.

2. Add save and restore code for register R (e.g. instructions 116 and118 in FIG. 3).

Also, add load and store instruction for global variable G (e.g.instructions 115 and 117 in FIG. 3) to and from callee-saves register Ron entry and exit (e.g. for variable g1 in procedure B in FIG. 5).

Note that a load or store instruction can be generated from theinformation in the program database.

3. All references to global variable G (e.g. g1 in FIG. 5) are to bereplaced with references to callee-saves register R (e.g. r17 in FIG.5).

The register actions for each node are written out to the programdatabase 56 by the program analyzer 54 and later queried and used by thecompiler phase 2 (12B in FIG. 4).

To better understand the algorithm described below, consider FIG. 5 andFIG. 6. The nodes of the program call graph 130, labeled A through Irepresent procedures and the edges 132, 134, 136, 138, 140, 142, 144,146, 148 between these nodes represent procedure calls. FIG. 6 describeshow three global variables g1, g2, g3 are accessed by the differentnodes of the program call graph.

The webs 150, 152, 154, 156 are identified for these three globalvariables. Note that there are two separate webs 152, 156 for the globalvariable g2. For this example, all four webs can be promoted using justtwo callee-saves registers, r17, r18. These callee-saves registers couldcorrespond to the interprocedural registers 110 and 113 shown in FIG. 3.

Different webs for the same variable may be assigned differentregisters. This is the case for web 4 156 and web 2 152 for globalvariable g2, which are assigned registers r17 and r18 respectively.

The compiler second phase 12B is responsible for converting memoryreferences to global variables corresponding to promoted webs intoregister references. For example, memory references to global variableg2 are converted into references of register r17 in procedures C, F, andG. Additionally, at the entry procedure C for web 2 152, code isinserted at the beginning to load the global variable g2 into theregister r17. Code is also inserted at the end of procedure C to storeback the global variable g2 from register r17 to the memory location inthe computer associated with that variable.

Interprocedural Global Register Promotion Algorithm

The following pseudo-code provides guidance for coding the algorithmsdiscussed herein. This pseudo-code is based on the "C" programminglanguage.

Input

A program call graph of `n` nodes. It is assumed that the representationof the call graph allows easy identification of a node's successors andpredecessors.

For each node of the call graph, a list of global variables that areexplicitly referenced, a set of heuristically assigned referencefrequencies for those globals, and the set of global variables (notnecessarily a subset of those referenced) whose address have been takenor are otherwise ineligible for promotion. This information is computedby the variable usage analyzer 40 shown in FIG. 2.

Output

For each node of the call graph, intraprocedural register actionsrequired to maintain a selected set of singleton global variables inregisters within certain regions of the call graph.

Main Data Structures

L₋₋ REF [i]

The dataflow set representing Local₋₋ References. For each node i, L₋₋REF[i] represents the set of eligible global variables (indices)referenced locally at that node.

P₋₋ REF [i]

The dataflow set representing Parent₋₋ References. For each node i, P₋₋REF[i] represents the set of eligible global variables (indices)referenced by all parent nodes of node i. A parent node is any node inthe call graph from which there is a forward call path (an invocationsequence without procedure returns) to node i.

C₋₋ REF [i]

The dataflow set representing Child References. For each node i, C₋₋REF[i] represents the set of eligible global variables referenced by allchild nodes of node i. A child node is any node in the call graph towhich there is a forward call path (an invocation sequence withoutprocedure returns) from node i.

R₋₋ Action [i,g]

Register Actions (1..3) at node i for promoted global variable

Ref₋₋ Count [g,i]

The reference count for variable g at node i

Global₋₋ Table [g]

For each unique eligible global g, there is an entry in the Global₋₋Table. Each entry for a global variable in the Global₋₋ Table has aunique index that is its numeric identifier (which is used in thedataflow sets). Each entry also contains the set of call graph nodeswhich access that global. The organization of this symbol table allowsfor easy entry, deletion and translation between the global variablename and entry index.

Web₋₋ Table [w]

For each web (identified by its entry index in this table), the Web₋₋Table entry contains the numeric global index, the entry nodes of theweb, the other nodes of the web, and the total reference count of theglobal over the nodes in the web.

Web₋₋ Interferences [w]

This is the web interference graph. The Web₋₋ Interferences entry foreach web contains the web numbers of all other webs that it interfereswith. ##SPC1##

The remaining pseudo-code routines are not specified in detail becausethey use generally available graph coloring algorithms. ##SPC2##

Register Spill Optimization

Software procedure calling conventions designate two classes ofregisters, callee-saves and caller-saves registers. Callee-savesregisters that are used must be spilled to memory at procedure entrypoints and restored at exit points. These registers may be used to keeplive values across calls to other procedures. Caller-saves registerscannot hold live values across calls, so their contents must betemporarily saved in memory if those contents are needed after the call.

The simple idea behind register spill optimization is to movecallee-saves register spill upwards in the call graph so that descendantnodes may use them "for free", hence the term, spill node motion.

The method according to the invention has the following features:

• Regions of the call graph called clusters are identified over whichspill code motion may be effective.

• The callee-saves registers are pre-allocated to nodes of the clusterby the program analyzer. This allows the register allocator to have someknowledge of every procedure's register usage without forcing theprocedures to be compiled in any particular order.

• Calls may still be made to procedures that are not known until linktime or load time, as long as those procedures follow the standardregister usage conventions.

• In some procedures, a larger number of caller-saves registers will beavailable than is allowed by the standard convention.

• The available set of callee-saves registers should get utilized moreefficiently.

Clusters

Clusters are identified for two reasons. First, to identify the nodeswhere it is safe and correct to execute spill code for other procedures.Second, to execute spill code relatively infrequently in order to reduceoverhead and achieve a performance improvement.

Conceptually, a cluster is a collection of nodes in the call graph thatcan be viewed as a single entity with regards to register allocation.The procedure calling convention will be adhered to at the boundary of acluster, but not internally. The idea is to have some nodes within thecluster be able to use callee-saves registers without incurring theexpense of saving and restoring them on entry and exit respectively. Ifsuch nodes within the cluster are heavily called, then a measurableperformance improvement should result.

Breaking up an entire call graph into disjoint clusters of nodes enablesmore effective pre-allocation of callee-saves registers within smallerregions. Without the notion of clusters, one would run out of the fewcallee-saves registers quickly in large programs. Regardless of whichway callee-saves registers are pre-allocated (top-down or bottom-up),certain regions of the call graph would be deprived of any benefit (orpossibly even impacted negatively).

Ideally clusters should be shallow with many nodes close to the clusterroot node. In addition, the root node should be invoked less frequentlythan the internal nodes. Propagating up the callee-spill code within thecluster to the root node should then speed up the application. One isless likely to run out of callee-saves registers for clusters that areshort. Additionally, wide clusters will allow the allocation of the sameset of callee-saves registers to many sibling nodes within the cluster.

Finally, partitioning the call graph into many different small clustersis preferable to having a few large clusters. This should allow a moreuniform distribution of entry-spill code across the entire call graph.Also, the pre-allocation algorithm can try to move entry-spill codeacross clusters if desired.

Cluster Definition

A cluster in the call graph is defined as a set of nodes with thefollowing properties:

[1] There exists some node R (e.g. A in FIG. 7), called the root of thecluster, which dominates all other nodes within the cluster. (Node Ddominates node N if and only if every path from any start node to Nincludes D.) Note that this does not imply that all nodes dominated bythe root node are in the cluster.

[2] For every node P (e.g. E in FIG. 7) in the cluster except R, allimmediate predecessors of P (e.g. B in FIG. 7) are also in the cluster.

[3] A non-root node P is included only in the cluster of the immediatelydominating root node.

For spill code motion to improve performance, the root node of a clustershould be called less frequently than the internal nodes of the cluster.This can be estimated by associating weights on the edges of the callgraph which indicate relative call frequencies. These weights can beheuristically derived by the compiler first phase, but they may be moreaccurately assigned by using profile information.

Root nodes are selected by examining the incoming and outgoing calls ateach node. The successors of a root node are added to the cluster byapplying conditions [1] and [2]. Note that condition [3] allows a leafnode of one cluster to also be the root node of another cluster (e.g. Cand D in FIG. 7).

Register Pre-Allocation

The register need analyzer of the compiler first phase can communicatethe approximate callee-saves register requirements to the ProgramAnalyzer by performing normal intraprocedural register allocation on theintermediate code generated. Based on this information for each node ofa cluster, registers are allocated within each cluster starting at theroot of the cluster and working downwards.

In effect, the callee-saves registers are split into four classes whichdescribe how each register can be used within each procedure. Theclasses are defined by two conditions: 1) whether or not the registermust be spilled on entry and restored on exit when it is used in theprocedure and 2) whether or not the register can be used to hold valuesacross procedure calls. These classes are identified by the followingsets:

FREE[P]--registers in this set need not be spilled if they are used, andmay hold live values across calls. These are essentially theinterprocedural registers.

CALLER₋₋ SAVES[P]--registers in this set need not be spilled if they areused, but may not hold live values across calls.

CALLEE₋₋ SAVES[P]--registers in this set must be spilled if they areused, and may hold live values across calls.

MSPILL[P]--registers in this set must be spilled if they are used andthey may not hold live values across calls.

When register assignment is done by the register allocator 62 (FIG. 3)for a procedure, the register allocator will query the program database56 to identify which set each machine register belongs to, and replaceeach pseudo-register 102, 104, 106 with a machine register 112, 113, 114from the appropriate set. The register allocator 62 must also add spillcode 116, 118 for those machine registers assigned from the CALLEE₋₋SAVES and MSPILL sets.

There is an additional requirement in the register allocator that allregisters in in the MSPILL set at a cluster root procedure must bespilled on entry and restored on exit (spill code 116 and 118 in FIG.3), regardless of whether or not they are actually used inside thatprocedure. This will accomplish the goal of having the root node executethe spill code for the remaining nodes of the cluster. The algorithmsdescribed hereinafter arrange that MSPILL will always be empty atnon-root nodes belonging to a cluster.

The algorithms described hereinafter also eliminate the possibility ofrecursive call cycles from occurring within a cluster. Consider thesimple case of a self-recursive routine which uses callee-savesregisters. Such a routine would expect that the values in theseregisters would remain safe across the recursive call. If spill code isnot executed at the entry point, however, the values that were liveacross that recursive call will be destroyed.

Recursive cycles are prevented from occurring within clusters. However,entire clusters may occur within cycles in the call graph.

If there exists a node within a cluster that makes a call back to theroot node, pre-allocation will note be done on that cluster. This isdone for performance reasons.

The pre-allocation technique herein proposed has one majorvulnerability. In particular, if a selected cluster root node is aprocedure that is called more frequently than the other nodes of thecluster, calls to that procedure would be slowed down.

FIG. 7 shows three clusters 158, 160, 162 that might be identified forthe program call graph 130. Cluster 158 comprises the nodes A, B, C, D,and E. Cluster 160 comprises the nodes D and E. Cluster 162 comprisesthe nodes C, F, G, and I. Note that nodes D and C are cluster root nodesthat are themselves part of cluster 158. Node A is the root node forcluster 158.

Spill Optimization Algorithms

Input: A call graph G of `n` nodes. It is assumed that therepresentation of the call graph allows easy identification of a node'ssuccessors & predecessors.

Output: Register sets MSPILL, CALLEE₋₋ SAVES, CALLER₋₋ SAVES, and FREEfor each node of G to be used by the register allocator. Thisinformation is stored in the program database 56.

The following data structures are used to help identify clusters.

Data Structures

visited [1..n]

Flag to mark a node as having been examined.

dom [1..n]

For each node i, dom[i] represents the set of nodes that dominate it.

cluster [1..n]

For each ROOT node i, cluster[i] lists the set of associated clusternodes.

AVAIL [1..n]

For each node i, AVAIL[i] represents a set of registers that will beused in the register preallocation process. ##SPC3##

An Example

Consider the sample cell graph 170 shown in FIG. 8. The nodes of thissample call graph are interconnected by directional edges 180, 182, 184,186, 188, 190. Assume that the depth-first search order of the nodes ofthis call graph is J, K, L, M, N, P. The depth-first search order is theorder in which these nodes are first considered by the algorithm.

Suppose that the function Is₋₋ A₋₋ Root() returns true for nodes J andM. After visiting node J, Examine₋₋ node() will be called on node K. Atthis point, however, the decision on whether or not to add node K to thecluster rooted at node J will be postponed. This is because node P hasnot yet been visited.

If node K is added to the cluster rooted at J, and P is added to thecluster roted at M, condition [2] from the cluster definition will beviolated. (Note K will have an immediate predecessor not in itscluster.)

The invention is practiced as follows:

Summary files 48, 50, 52 are produced for each of the source files 18,20, 22. The program analyzer 54 is invoked specifying the names of thesummary files as command line options. The program analyzer 54 readseach of the summary files, constructs the program call graph 130,computes register actions need to do global variable promotion and theregister sets (MSPILL, FREE, CALLER₋₋ SAVES, CALLEE₋₋ SAVES) for spillcode motion and finally writes these register actions and register setsout to the program database 56.

The program analyzer 54 performs global register promotion and registerspill optimization independently. Interprocedural registers reserved forglobal promotion are not available for pre-allocation.

After the program database 56 is created, the intermediate files (or thesource files) are read by the compiler second phase 12B along with therecord of register actions and register sets from the program databasefor each procedure being compiled. The register actions are then appliedto each procedure by the register allocator 62 and object files 120 arecreated. The object files 120 thus created are read by the programlinker 17 linked with the run-time libraries 19 to produce an executablefile 24 with interprocedural register allocations.

File Structure

The logical structure of the summary files is described below usingBackus Normal Format (BNF) notation.

    ______________________________________                                        <Summary.sub.-- File> ::= <Procedure.sub.-- Record> *                         <Procedure.sub.-- Record> ::=                                                           Procedure.sub.-- Name                                                                            +                                                          Num.sub.-- Callee.sub.-- Regs.sub.-- Needed                                                      +                                                          <Callee.sub.-- Info.sub.-- Record>*                                                              +                                                          <Global.sub.-- Usage.sub.-- Info.sub.-- Record>*                                                 +                                                          <Plabel.sub.-- Record>*                                             <Callee.sub.-- Info.sub.-- Record> ::=                                                  Callee.sub.-- Name +                                                          Static.sub.-- Call.sub.-- Count                                     <Global.sub.-- Usage.sub.-- Info.sub.-- Record>::=                                      Global.sub.-- Variable.sub.-- Name                                                               +                                                          Global.sub.-- Attributes                                                                         +                                                          Weighted.sub.-- Reference.sub.-- Count                                                           +                                                          Num.sub.-- References                                                                            +                                                <Global.sub.-- Attributes> ::=                                                          Global.sub.-- Type |                                                 Static.sub.-- Variable?                                                                          |                                                 Address.sub.-- Taken?                                               <Plabel.sub.-- Record> ::= Indirectly.sub.-- callable.sub.-- procedure.sub    .-- name                                                                      ______________________________________                                    

The variable usage analyzer 40 computes the Global₋₋ Usage₋₋ Info₋₋Record. The procedure call analyzer 44 computes the Callee₋₋ Info₋₋Record and Plabel₋₋ Record. The register need analyzer computes theNum₋₋ Callee₋₋ Regs₋₋ Needed field of the Procedure₋₋ Record. Finally,the summary file generator 46 organizes the all the fields of theProcedure₋₋ Record and writes it out to the summary file 48, 50, 52 ofthe corresponding source code file 18, 20, 22.

The logical structure of the program database file is described belowusing the same BNF notation.

    ______________________________________                                        <Database.sub.-- File>                                                                      ::= <Procedure.sub.-- Record>*                                  <Procedure.sub.-- Record>                                                                   ::= <Register.sub.-- Set>                                                                        +                                                          <Reg.sub.-- Action.sub.-- Record>                               <Register.sub.-- Set> ::=                                                                      MSPILL          +                                                             FREE            +                                                             CALLER.sub.-- SAVES                                                                           +                                                             CALLEE.sub.-- SAVES                                          <Reg.sub.-- Action.sub.-- Record> ::=                                                          Reg.sub.-- Action [1..3]                                                                      +                                                             Global.sub.-- Name                                                                            +                                                             Interprocedural Register                                     ______________________________________                                    

The invention has now been described with respect to specificembodiments. Other embodiments will be apparent to those of ordinaryskill in this art upon reference to this description. It is thereforenot intended that this invention be limited, except as limited by theappended claims.

What is claimed is:
 1. A method for optimizing register usage in anexecutable computer program on a computer processor having a limitedplurality of machine registers, said computer program being compiledfrom a plurality of individual source code files, said method comprisingthe steps of:reading said individual source code files having high-levelprogram language text reciting a plurality of procedures, said sourcecode files being read one at a time; determining syntactic and semanticcorrectness of each said source code file; translating each said sourcecode file into an intermediate representation and generating therefroman intermediate representation file; collecting local information aboutusage of global variables from each said source code file, wherein aglobal variable is a named storage location the contents of which can bestored in a single machine register and is accessible from a pluralityof procedures; estimating need of registers for each procedure from eachsaid intermediate representation; and constructing a record of saidregister need and said global variable usage and calls to procedures foreach procedure in a summary file for each said source code file.
 2. Themethod according to claim 1 further including the steps of:computing, ina program analyzer, interprocedural register allocation optimization(IRAO) information from all said summary files to be carried out astransformations by subsequent compiler processes; and storing said IRAOinformation in a program database file for use by said subsequentcompiler processes.
 3. The method according to claim 2 further includingthe steps of:generating profile information about execution of saidcomputer program from a previous compilation of said source code filesand execution of said computer program by determining frequency ofexecution of at least said procedures; and supplying said profileinformation to said program analyzer to aid in the said computing ofsaid IRAO information.
 4. The method according to claim 2 furtherincluding the steps of:transforming each said intermediaterepresentation file into a sequence of machine instructions for eachsaid procedure, each said sequence of machine instructions employing anplurality of pseudo-registers; and implementing intraprocedural registerallocation and interprocedural register allocation optimization on saidsequence of machine instructions and based on said IRAO informationaccessed from said program database file.
 5. The method according toclaim 4 wherein said IRAO information computing step includespartitioning said machine registers between interprocedural registersand intraprocedural registers and wherein said implementing stepcomprises the steps of:mapping first selected ones of saidpseudo-registers into said limited plurality of said intraproceduralmachine registers; and mapping second selected ones of saidpseudo-registers into said limited plurality of said interproceduralmachine registers in accordance with said IRAO information.
 6. Themethod according to claim 2 wherein said computing step of the programanalyzer comprises:constructing a program call graph (PCG) from all saidrecords, said PCG comprising a set of nodes, each one of said nodesrepresenting one of said procedures interconnected by directional edges,each said directional edge representing a call from a first procedure toa second procedure; creating webs on said PCG for selected ones of saidglobal variables, wherein a web for a single global variable is acollection of said nodes such that said global variable is accessed inat least one node of said web and such that for each node in said web,said global variable is not accessed in any ancestor node not in saidweb, and said global variable is not accessed by any descendant node notin said web, and wherein a plurality of webs may be created for a singleglobal variable; prioritizing said webs according to frequency of use ofsaid global variable within nodes of said web; assigning a firstavailable one of said machine registers as an interprocedural machineregister to first selected ones of said webs according to saidprioritizing step, wherein no two of said selected webs having a node incommon can be assigned the same machine register; and assigning furtheravailable ones of said machine registers as interprocedural machineregisters to further selected ones of said webs according to saidprioritizing step until all of said available ones of machine registersare assigned or until all selected ones of said webs have been assignedan available machine register.
 7. The method according to claim 2wherein said computing step of the program analyzercomprises:constructing a program call graph (PCG) from all said records,said PCG comprising a set of nodes, each one of said nodes representingone of said procedures interconnected by directional edges, each saiddirectional edge representing a call from a first procedure to a secondprocedure; creating clusters on said PCG, wherein a cluster is acollection of said nodes such that there exists a unique root node ofsaid cluster only through which every other node in said cluster can becalled, to obtain a cluster organization; partitioning, for each saidcluster, said machine registers into interprocedural registers andintraprocedural registers for each of the said nodes within saidclusters according to said register need and as restricted by saidcluster organization; and designating, for each said cluster, that acluster root node execute machine instructions to preserve values ofsaid interprocedural registers used within said cluster upon calls tosaid cluster root node so that other nodes within said cluster need notexecute said machine instructions.
 8. The method according to claim 6wherein said computing step of the program analyzercomprises:constructing a program call graph (PCG) from all said records,said PCG comprising a set of nodes, each one of said nodes representingone of said procedures interconnected by directional edges, each saiddirectional edge representing a call from a first procedure to a secondprocedure; creating clusters on said PCG, wherein a cluster is acollection of said nodes such that there exists a unique root node ofsaid cluster only through which every other node in said cluster can becalled, to obtain a cluster organization; partitioning, for each saidcluster, said machine registers into interprocedural registers andintraprocedural registers for each of the said nodes within saidclusters according to said register need and as restricted by saidcluster organization; and designating, for each said cluster, that saidcluster root node execute machine instructions to preserve the values ofsaid interprocedural registers used within said cluster upon calls tosaid cluster root node so that other nodes within said cluster need notexecute said machine instructions.
 9. A method for optimizing registerusage in an executable computer program on a computer processor having alimited plurality of machine registers, said computer program beingcompiled from a plurality of individual source code files, said methodcomprising the steps of:reading said individual source code files havinghigh-level program language text reciting a plurality of procedures,said source code files being read one at a time; determining syntacticand semantic correctness of each said source code file; translating eachsaid source code file into an intermediate representation; collectinglocal information about usage of global variables from each said sourcecode file, wherein a global variable is a named storage location thecontents of which can be stored in a single machine register and isaccessible from a plurality of procedures; estimating need of registersfor each procedure from each said intermediate representation; andconstructing a record of said register need and said global variableusage and calls to procedures for each procedure in a summary file foreach said source code file.
 10. The method according to claim 9 furtherincluding the steps of:transforming each said source file into asequence of machine instructions for each said procedure, each saidsequence of machine instructions employing a plurality ofpseudo-registers; and implementing intraprocedural register allocationand interprocedural register allocation optimization on said sequence ofmachine instructions and based on said IRAO information accessed fromsaid program database file.
 11. An apparatus for optimizing registerusage in an executable computer program on a computer processor having alimited plurality of machine registers, said computer program beingcompiled from a plurality of individual source code files, saidapparatus comprising:means for reading said individual source code fileshaving high-level program language text reciting a plurality ofprocedures, said source code files being read one at a time; meanscoupled to said reading means for determining syntactic and semanticcorrectness of each said source code file; means coupled to saiddetermining means for translating each said source code file into anintermediate representation and generating therefrom an intermediaterepresentation file; means coupled to said translating means forcollecting local information about usage of global variables from eachsaid source code file, wherein a global variable is a named storagelocation the contents of which can be stored in a single machineregister and is accessible from a plurality of procedures; means coupledto said collecting means for estimating need of registers for eachprocedure from each said intermediate representation; and means coupledto said estimating means, to said collecting means, and to saidtranslating means for constructing a record of said register need andsaid global variable usage and calls to procedures for each procedure ina summary file for each said source code file.
 12. An apparatus foroptimizing register usage in an executable computer program on acomputer processor having a limited plurality of machine registers, saidcomputer program being compiled from a plurality of individual sourcecode files, said apparatus comprising:means for reading said individualsource code files having high-level program language text reciting aplurality of procedures, said source code files being read one at atime; means coupled to said reading means for determining syntactic andsemantic correctness of each said source code file; means coupled tosaid determining means for translating each said source code file intoan intermediate representation; means coupled to said translating meansfor collecting local information about usage of global variables fromeach said source code file, wherein a global variable is a named storagelocation the contents of which can be stored in a single machineregister and is accessible from a plurality of procedures; means coupledto said collecting means for estimating need of registers for eachprocedure from each said intermediate representation; and means coupledto said estimating means, to said collecting means, and to saidtranslating means for constructing a record of said register need andsaid global variable usage and calls to procedures for each procedure ina summary file for each said source code file.
 13. The apparatusaccording to claim 12 further comprising:a program analyzer means forcomputing interprocedural register allocation optimization (IRAO)information from all said summary files to be carried out astransformations by subsequent compiler processes; and means for storingsaid IRAO information in a program database file for use by saidsubsequent compiler processes.
 14. The apparatus according to claim 13further comprising:means for transforming each said source code fileinto a sequence of machine instructions for each said procedure, eachsaid sequence of machine instructions employing a plurality ofpseudo-registers; and means coupled to said transforming means forimplementing intraprocedural register allocation and interproceduralregister allocation optimization on said sequence of machineinstructions and based on said IRAO information accessed from saidprogram database file.
 15. A method of operating a general purpose dataprocessor having a plurality of machine registers, a sub-set thereofbeing assigned for use as interprocedural registers, so as to allow moreefficient allocation of said procedural registers when said dataprocessor is executing a computer program comprising a plurality ofprocedures, at least one of said procedures operating on a globalvariable, said method comprising the steps of:building a program callgraph, said program call graph comprising a set of nodes, each said noderepresenting a procedure, interconnected by directional edges to othersaid nodes, each said edge representing a call from a first procedure toa second procedure, the node representing said first procedure being theancestor of the node representing said second procedure and the noderepresenting said second node being the descendent of the noderepresenting said first procedure; defining webs corresponding to globalvariables, each said web corresponding to a global variable, each saidweb comprising a collection of program call graph nodes such that saidcorresponding global variable is accessed in at least one node in saidweb and such that, for each node in said web, said corresponding globalvariable is not accessed in any ancestor node not in said web, and saidglobal variable is not accessed by an descendant node not in said web;determining the order for said webs; and assigning said global variablesto interprocedural machine registers according to the order of said webscorresponding to said global variables in said determined order, whereinsaid selected global variables comprise said global variables areeligible for assignment to an interprocedural machine register.
 16. Amethod of operating a general purpose data processor having a pluralityof machine register, a sub-set thereof being assigned for use asinterprocedural registers, so as to allow more efficient allocation ofsaid interprocedural registers when said data processor is executing acomputer program comprising a plurality of procedures, at least one ofsaid procedures operating on a global variable, said method comprisingthe steps of:building a program call graph, said program call graphcomprising a set of nodes, each said node representing a procedure,interconnected by directional edges to other said nodes, each said edgerepresenting a call from a first procedure to a second procedure, thenode representing said first procedure being the ancestor of the noderepresenting said second procedure and the node representing said secondnode being the descendent of the node representing said first procedure;defining webs corresponding to global variables, each said webcorresponding to a global variable, each said web comprising acollection of program call graph nodes such that said correspondingglobal variable is accessed in at least one node in said web and suchthat, for each node in said web, said corresponding global variable isnot accessed in any ancestor node not in said web, and said globalvariable is not accessed by an descendant node not in said web;determining the order for said webs; and assigning said global variablesto interprocedural machine registers according to the order of said webscorresponding to said global variables in said determined order, whereinsaid order determined for said webs is determined by the frequency ofuse of the global variable corresponding to each said web.
 17. A methodof operating a general purpose data processor having a plurality ofmachine register, a sub-set thereof being assigned for use asinterprocedural registers, so as to allow more efficient allocation ofsaid interprocedural registers when said data processor is executing acomputer program comprising a plurality of procedures, at least one ofsaid procedures operating on a global variable, said method comprisingthe steps of:building a program call graph, said program call graphcomprising a set of nodes, each said node representing a procedure,interconnected by directional edges to other said nodes, each said edgerepresenting a call from a first procedure to a second procedure, thenode representing said first procedure being the ancestor of the noderepresenting said second procedure and the node representing said secondnode being the descendent of the node representing said first procedure;defining webs corresponding to global variables, each said webcorresponding to a global variable, each said web comprising acollection of program call graph nodes such that said correspondingglobal variable is accessed in at least one node in said web and suchthat, for each node in said web, said corresponding global variable isnot accessed in any ancestor node not in said web, and said globalvariable is not accessed by an descendant node not in said web;determining the order for said webs; and assigning said global variablesto interprocedural machine registers according to the order of said webscorresponding to said global variables in said determined order, whereinsaid order determined for said webs is determined from profileinformation collected by executing said program with exemplary inputdata on a data processing system capable of running said program.
 18. Amethod of operating a general purpose data processor having a pluralityof machine register, a sub-set thereof being assigned for use asinterprocedural registers, so as to allow more efficient allocation ofsaid interprocedural registers when said data processor is executing acomputer program comprising a plurality of procedures, at least one ofsaid procedures operating on a global variable, said method comprisingthe steps of:building a program call graph, said program call graphcomprising a set of nodes, each said node representing a procedure,interconnected by directional edges to other said nodes, each said edgerepresenting a call from a first procedure to a second procedure, thenode representing said first procedure being the ancestor of the noderepresenting said second procedure and the node representing said secondnode being the descendent of the node representing said first procedure;defining webs corresponding to global variables, each said webcorresponding to a global variable, each said web comprising acollections of program call graph nodes such that said correspondingglobal variable is accessed in at least one node in said web and suchthat, for each node in said web, said corresponding global variable isnot accessed in any ancestor node not in said web, and said globalvariable is not accessed by an descendant node not in said web;determining the order for said webs; and assigning said global variablesto interprocedural machine registers according to the order of said webscorresponding to said global variables in said determined order, saidmethod further comprising the step of identifying a procedure into whichcode is to be inserted, said code causing the contents of one of saidinterprocedural register to be stored in a location in said dataprocessing system different from said interprocedural register uponentry into said procedure thereby freeing said interprocedural registerfor use in storing a different said global variable.
 19. The method ofclaim 18 wherein said step of identifying a procedure comprisesidentifying clusters of nodes in said program call graph, each saidcluster comprising a set of connected nodes having a root node such thatevery other node in the cluster can be called through said unique rootnode.
 20. The method of claim 19 wherein said clusters are identified byusing profile information collected by executing said program withexemplary input data on a data processing system capable of running saidprogram.